JHU_GL_logo.png

Project 1 - DualLens Analytics

image.png

Background Story

In the rapidly evolving world of finance and technology, investors are constantly seeking ways to make smarter decisions by combining traditional financial analysis with emerging technological insights. While stock market trends provide a numerical perspective on growth, an organization’s initiatives in cutting-edge fields like Artificial Intelligence (AI) reveal its future readiness and innovation potential. However, analyzing both dimensions - quantitative financial performance and qualitative AI initiatives - requires sifting through multiple, diverse data sources: stock data from platforms like Yahoo Finance, reports in PDFs, and contextual reasoning using Large Language Models (LLMs).

This is where DualLens Analytics comes in. By applying a dual-lens approach, the project leverages Retrieval-Augmented Generation (RAG) to merge financial growth data with strategic insights from organizational reports. Stock data provides evidence of stability and momentum, while AI initiative documents reveal forward-looking innovation. Together, they form a richer, more holistic picture of organizational potential.

With DualLens Analytics, investors no longer need to choose between numbers and narratives—they gain a unified, AI-driven perspective that ranks organizations by both financial strength and innovation readiness, enabling smarter, future-focused investment strategies.

Problem Statement

Traditional investment analysis often focuses on financial metrics alone (e.g., stock growth, revenue, market cap), missing the qualitative dimension of how prepared a company is for the future. On the other hand, qualitative documents like strategy PDFs contain valuable insights about innovation and AI initiatives, but they are difficult to structure, query, and integrate with numeric financial data.

This leads to three core challenges:

  1. Fragmented Data Sources: Financial data (stock prices) and strategic insights (PDFs) exist in silos.

  2. Limited Analytical Scope: Manual analysis of growth trends and PDF reports is time-consuming and error-prone.

  3. Decisional Blind Spots: Without integrating both quantitative (growth trends) and qualitative (AI initiatives) signals, investors may miss out on high-potential organizations.

Solution Approach

To address this challenge, we set out to build a Retrieval-Augmented Generation (RAG) powered system that blends financial trends with AI-related strategic insights, helping investors rank organizations based on growth trajectory and innovation capacity.

image.png

NOTE

You need to look for "--- --- ---" and add your code over there, this is a placeholder.</font>

Setting up Installations and Imports

In [2]:
# @title Run this cell => Restart the session => Start executing the below cells **(DO NOT EXECUTE THIS CELL AGAIN)**

!pip install langchain==0.3.25 \
                langchain-core==0.3.65 \
                langchain-openai==0.3.24 \
                chromadb==0.6.3 \
                langchain-community==0.3.20 \
                pypdf==5.4.0
Collecting langchain==0.3.25
  Downloading langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-core==0.3.65
  Downloading langchain_core-0.3.65-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-openai==0.3.24
  Downloading langchain_openai-0.3.24-py3-none-any.whl.metadata (2.3 kB)
Collecting chromadb==0.6.3
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting langchain-community==0.3.20
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting pypdf==5.4.0
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain==0.3.25)
  Downloading langchain_text_splitters-0.3.11-py3-none-any.whl.metadata (1.8 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain==0.3.25)
  Downloading langsmith-0.3.45-py3-none-any.whl.metadata (15 kB)
Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.12.3)
Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.0.45)
Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (2.32.4)
Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.12/dist-packages (from langchain==0.3.25) (6.0.3)
Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (9.1.2)
Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (1.33)
Collecting packaging<25,>=23.2 (from langchain-core==0.3.65)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.12/dist-packages (from langchain-core==0.3.65) (4.15.0)
Collecting openai<2.0.0,>=1.86.0 (from langchain-openai==0.3.24)
  Downloading openai-1.109.1-py3-none-any.whl.metadata (29 kB)
Requirement already satisfied: tiktoken<1,>=0.7 in /usr/local/lib/python3.12/dist-packages (from langchain-openai==0.3.24) (0.12.0)
Collecting build>=1.0.3 (from chromadb==0.6.3)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb==0.6.3)
  Downloading chroma_hnswlib-0.7.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Requirement already satisfied: fastapi>=0.95.2 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.123.10)
Requirement already satisfied: uvicorn>=0.18.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (0.38.0)
Requirement already satisfied: numpy>=1.22.5 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (2.0.2)
Collecting posthog>=2.4.0 (from chromadb==0.6.3)
  Downloading posthog-7.4.0-py3-none-any.whl.metadata (6.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb==0.6.3)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Requirement already satisfied: opentelemetry-api>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.37.0)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb==0.6.3)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.39.1-py3-none-any.whl.metadata (2.5 kB)
Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb==0.6.3)
  Downloading opentelemetry_instrumentation_fastapi-0.60b1-py3-none-any.whl.metadata (2.2 kB)
Requirement already satisfied: opentelemetry-sdk>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.37.0)
Requirement already satisfied: tokenizers>=0.13.2 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.22.1)
Collecting pypika>=0.48.9 (from chromadb==0.6.3)
  Downloading PyPika-0.48.9.tar.gz (67 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.3/67.3 kB 1.9 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: tqdm>=4.65.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (4.67.1)
Requirement already satisfied: overrides>=7.3.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (7.7.0)
Requirement already satisfied: importlib-resources in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (6.5.2)
Requirement already satisfied: grpcio>=1.58.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (1.76.0)
Collecting bcrypt>=4.0.1 (from chromadb==0.6.3)
  Downloading bcrypt-5.0.0-cp39-abi3-manylinux_2_34_x86_64.whl.metadata (10 kB)
Requirement already satisfied: typer>=0.9.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.20.0)
Collecting kubernetes>=28.1.0 (from chromadb==0.6.3)
  Downloading kubernetes-34.1.0-py2.py3-none-any.whl.metadata (1.7 kB)
Requirement already satisfied: mmh3>=4.0.1 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (5.2.0)
Requirement already satisfied: orjson>=3.9.12 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (3.11.5)
Requirement already satisfied: httpx>=0.27.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (0.28.1)
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.12/dist-packages (from chromadb==0.6.3) (13.9.4)
Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (3.13.2)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community==0.3.20)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Requirement already satisfied: pydantic-settings<3.0.0,>=2.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (2.12.0)
Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community==0.3.20) (0.4.3)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (2.6.1)
Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.4.0)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (25.4.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.8.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (6.7.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (0.4.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community==0.3.20) (1.22.0)
Collecting pyproject_hooks (from build>=1.0.3->chromadb==0.6.3)
  Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: starlette<0.51.0,>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.95.2->chromadb==0.6.3) (0.50.0)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.95.2->chromadb==0.6.3) (0.0.4)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (4.12.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (2025.11.12)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx>=0.27.0->chromadb==0.6.3) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx>=0.27.0->chromadb==0.6.3) (0.16.0)
Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.12/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core==0.3.65) (3.0.0)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (1.17.0)
Requirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.9.0.post0)
Requirement already satisfied: google-auth>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.43.0)
Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (1.9.0)
Requirement already satisfied: requests-oauthlib in /usr/local/lib/python3.12/dist-packages (from kubernetes>=28.1.0->chromadb==0.6.3) (2.0.0)
Collecting urllib3<2.4.0,>=1.24.2 (from kubernetes>=28.1.0->chromadb==0.6.3)
  Downloading urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting durationpy>=0.7 (from kubernetes>=28.1.0->chromadb==0.6.3)
  Downloading durationpy-0.10-py3-none-any.whl.metadata (340 bytes)
INFO: pip is looking at multiple versions of langchain-text-splitters to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain==0.3.25)
  Downloading langchain_text_splitters-0.3.10-py3-none-any.whl.metadata (1.9 kB)
  Downloading langchain_text_splitters-0.3.9-py3-none-any.whl.metadata (1.9 kB)
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from langsmith<0.4,>=0.1.17->langchain==0.3.25) (1.0.0)
Collecting zstandard<0.24.0,>=0.23.0 (from langsmith<0.4,>=0.1.17->langchain==0.3.25)
  Downloading zstandard-0.23.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting coloredlogs (from onnxruntime>=1.14.1->chromadb==0.6.3)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: flatbuffers in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (25.9.23)
Requirement already satisfied: protobuf in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (5.29.5)
Requirement already satisfied: sympy in /usr/local/lib/python3.12/dist-packages (from onnxruntime>=1.14.1->chromadb==0.6.3) (1.14.0)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (1.9.0)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (0.12.0)
Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from openai<2.0.0,>=1.86.0->langchain-openai==0.3.24) (1.3.1)
Requirement already satisfied: importlib-metadata<8.8.0,>=6.0 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-api>=1.2.0->chromadb==0.6.3) (8.7.0)
Requirement already satisfied: googleapis-common-protos~=1.57 in /usr/local/lib/python3.12/dist-packages (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3) (1.72.0)
Collecting opentelemetry-exporter-otlp-proto-common==1.39.1 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3)
  Downloading opentelemetry_exporter_otlp_proto_common-1.39.1-py3-none-any.whl.metadata (1.8 kB)
Collecting opentelemetry-proto==1.39.1 (from opentelemetry-exporter-otlp-proto-grpc>=1.2.0->chromadb==0.6.3)
  Downloading opentelemetry_proto-1.39.1-py3-none-any.whl.metadata (2.3 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb==0.6.3)
  Downloading opentelemetry_sdk-1.39.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-instrumentation-asgi==0.60b1 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading opentelemetry_instrumentation_asgi-0.60b1-py3-none-any.whl.metadata (2.0 kB)
Collecting opentelemetry-instrumentation==0.60b1 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading opentelemetry_instrumentation-0.60b1-py3-none-any.whl.metadata (7.2 kB)
Collecting opentelemetry-semantic-conventions==0.60b1 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading opentelemetry_semantic_conventions-0.60b1-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-util-http==0.60b1 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading opentelemetry_util_http-0.60b1-py3-none-any.whl.metadata (2.6 kB)
Collecting wrapt<2.0.0,>=1.0.0 (from opentelemetry-instrumentation==0.60b1->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading wrapt-1.17.3-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (6.4 kB)
Collecting asgiref~=3.0 (from opentelemetry-instrumentation-asgi==0.60b1->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb==0.6.3)
  Downloading asgiref-3.11.0-py3-none-any.whl.metadata (9.3 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb==0.6.3)
  Downloading opentelemetry_api-1.39.1-py3-none-any.whl.metadata (1.5 kB)
Collecting backoff>=1.10.0 (from posthog>=2.4.0->chromadb==0.6.3)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (2.41.4)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain==0.3.25) (0.4.2)
Requirement already satisfied: python-dotenv>=0.21.0 in /usr/local/lib/python3.12/dist-packages (from pydantic-settings<3.0.0,>=2.4.0->langchain-community==0.3.20) (1.2.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3,>=2->langchain==0.3.25) (3.4.4)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb==0.6.3) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=10.11.0->chromadb==0.6.3) (2.19.2)
Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.12/dist-packages (from SQLAlchemy<3,>=1.4->langchain==0.3.25) (3.3.0)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.12/dist-packages (from tiktoken<1,>=0.7->langchain-openai==0.3.24) (2025.11.3)
Requirement already satisfied: huggingface-hub<2.0,>=0.16.4 in /usr/local/lib/python3.12/dist-packages (from tokenizers>=0.13.2->chromadb==0.6.3) (0.36.0)
Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb==0.6.3) (8.3.1)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.9.0->chromadb==0.6.3) (1.5.4)
Collecting httptools>=0.6.3 (from uvicorn[standard]>=0.18.3->chromadb==0.6.3)
  Downloading httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (3.5 kB)
Collecting uvloop>=0.15.1 (from uvicorn[standard]>=0.18.3->chromadb==0.6.3)
  Downloading uvloop-0.22.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (4.9 kB)
Collecting watchfiles>=0.13 (from uvicorn[standard]>=0.18.3->chromadb==0.6.3)
  Downloading watchfiles-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.18.3->chromadb==0.6.3) (15.0.1)
Requirement already satisfied: cachetools<7.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (6.2.4)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (0.4.2)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.12/dist-packages (from google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (4.9.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (3.20.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (2025.3.0)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.13.2->chromadb==0.6.3) (1.2.0)
Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.12/dist-packages (from importlib-metadata<8.8.0,>=6.0->opentelemetry-api>=1.2.0->chromadb==0.6.3) (3.23.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->chromadb==0.6.3) (0.1.2)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community==0.3.20)
  Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime>=1.14.1->chromadb==0.6.3)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.12/dist-packages (from requests-oauthlib->kubernetes>=28.1.0->chromadb==0.6.3) (3.3.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy->onnxruntime>=1.14.1->chromadb==0.6.3) (1.3.0)
Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /usr/local/lib/python3.12/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes>=28.1.0->chromadb==0.6.3) (0.6.1)
Downloading langchain-0.3.25-py3-none-any.whl (1.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 11.1 MB/s eta 0:00:00
Downloading langchain_core-0.3.65-py3-none-any.whl (438 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 438.1/438.1 kB 21.8 MB/s eta 0:00:00
Downloading langchain_openai-0.3.24-py3-none-any.whl (68 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 69.0/69.0 kB 3.5 MB/s eta 0:00:00
Downloading chromadb-0.6.3-py3-none-any.whl (611 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 611.1/611.1 kB 24.7 MB/s eta 0:00:00
Downloading langchain_community-0.3.20-py3-none-any.whl (2.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 39.8 MB/s eta 0:00:00
Downloading pypdf-5.4.0-py3-none-any.whl (302 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.3/302.3 kB 14.5 MB/s eta 0:00:00
Downloading chroma_hnswlib-0.7.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 53.5 MB/s eta 0:00:00
Downloading bcrypt-5.0.0-cp39-abi3-manylinux_2_34_x86_64.whl (278 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 278.2/278.2 kB 20.8 MB/s eta 0:00:00
Downloading build-1.3.0-py3-none-any.whl (23 kB)
Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Downloading kubernetes-34.1.0-py2.py3-none-any.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 76.3 MB/s eta 0:00:00
Downloading langchain_text_splitters-0.3.8-py3-none-any.whl (32 kB)
Downloading langsmith-0.3.45-py3-none-any.whl (363 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.0/363.0 kB 30.5 MB/s eta 0:00:00
Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.4/17.4 MB 72.1 MB/s eta 0:00:00
Downloading openai-1.109.1-py3-none-any.whl (948 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 948.6/948.6 kB 52.6 MB/s eta 0:00:00
Downloading opentelemetry_exporter_otlp_proto_grpc-1.39.1-py3-none-any.whl (19 kB)
Downloading opentelemetry_exporter_otlp_proto_common-1.39.1-py3-none-any.whl (18 kB)
Downloading opentelemetry_proto-1.39.1-py3-none-any.whl (72 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.5/72.5 kB 6.1 MB/s eta 0:00:00
Downloading opentelemetry_instrumentation_fastapi-0.60b1-py3-none-any.whl (13 kB)
Downloading opentelemetry_instrumentation-0.60b1-py3-none-any.whl (33 kB)
Downloading opentelemetry_instrumentation_asgi-0.60b1-py3-none-any.whl (16 kB)
Downloading opentelemetry_semantic_conventions-0.60b1-py3-none-any.whl (219 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 220.0/220.0 kB 15.9 MB/s eta 0:00:00
Downloading opentelemetry_api-1.39.1-py3-none-any.whl (66 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.4/66.4 kB 4.2 MB/s eta 0:00:00
Downloading opentelemetry_util_http-0.60b1-py3-none-any.whl (8.9 kB)
Downloading opentelemetry_sdk-1.39.1-py3-none-any.whl (132 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.6/132.6 kB 10.1 MB/s eta 0:00:00
Downloading packaging-24.2-py3-none-any.whl (65 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 5.8 MB/s eta 0:00:00
Downloading posthog-7.4.0-py3-none-any.whl (166 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.1/166.1 kB 15.2 MB/s eta 0:00:00
Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Downloading durationpy-0.10-py3-none-any.whl (3.9 kB)
Downloading httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (517 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 517.7/517.7 kB 30.8 MB/s eta 0:00:00
Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 kB 4.3 MB/s eta 0:00:00
Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Downloading urllib3-2.3.0-py3-none-any.whl (128 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 128.4/128.4 kB 11.8 MB/s eta 0:00:00
Downloading uvloop-0.22.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (4.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 87.7 MB/s eta 0:00:00
Downloading watchfiles-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 456.8/456.8 kB 34.5 MB/s eta 0:00:00
Downloading zstandard-0.23.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 88.5 MB/s eta 0:00:00
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.0/46.0 kB 3.6 MB/s eta 0:00:00
Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB)
Downloading asgiref-3.11.0-py3-none-any.whl (24 kB)
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 8.0 MB/s eta 0:00:00
Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
Downloading wrapt-1.17.3-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 kB 7.5 MB/s eta 0:00:00
Building wheels for collected packages: pypika
  Building wheel for pypika (pyproject.toml) ... done
  Created wheel for pypika: filename=pypika-0.48.9-py2.py3-none-any.whl size=53803 sha256=f0fa8ffe385e39bfb922989d825be3b82b38f6f2cda0fafef0b39f9c6bf64a28
  Stored in directory: /root/.cache/pip/wheels/d5/3d/69/8d68d249cd3de2584f226e27fd431d6344f7d70fd856ebd01b
Successfully built pypika
Installing collected packages: pypika, durationpy, zstandard, wrapt, uvloop, urllib3, pyproject_hooks, pypdf, packaging, opentelemetry-util-http, opentelemetry-proto, mypy-extensions, humanfriendly, httptools, chroma-hnswlib, bcrypt, backoff, asgiref, watchfiles, typing-inspect, opentelemetry-exporter-otlp-proto-common, opentelemetry-api, marshmallow, coloredlogs, build, posthog, opentelemetry-semantic-conventions, openai, onnxruntime, dataclasses-json, opentelemetry-sdk, opentelemetry-instrumentation, langsmith, kubernetes, opentelemetry-instrumentation-asgi, opentelemetry-exporter-otlp-proto-grpc, langchain-core, opentelemetry-instrumentation-fastapi, langchain-text-splitters, langchain-openai, langchain, chromadb, langchain-community
  Attempting uninstall: zstandard
    Found existing installation: zstandard 0.25.0
    Uninstalling zstandard-0.25.0:
      Successfully uninstalled zstandard-0.25.0
  Attempting uninstall: wrapt
    Found existing installation: wrapt 2.0.1
    Uninstalling wrapt-2.0.1:
      Successfully uninstalled wrapt-2.0.1
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.5.0
    Uninstalling urllib3-2.5.0:
      Successfully uninstalled urllib3-2.5.0
  Attempting uninstall: packaging
    Found existing installation: packaging 25.0
    Uninstalling packaging-25.0:
      Successfully uninstalled packaging-25.0
  Attempting uninstall: opentelemetry-proto
    Found existing installation: opentelemetry-proto 1.37.0
    Uninstalling opentelemetry-proto-1.37.0:
      Successfully uninstalled opentelemetry-proto-1.37.0
  Attempting uninstall: opentelemetry-exporter-otlp-proto-common
    Found existing installation: opentelemetry-exporter-otlp-proto-common 1.37.0
    Uninstalling opentelemetry-exporter-otlp-proto-common-1.37.0:
      Successfully uninstalled opentelemetry-exporter-otlp-proto-common-1.37.0
  Attempting uninstall: opentelemetry-api
    Found existing installation: opentelemetry-api 1.37.0
    Uninstalling opentelemetry-api-1.37.0:
      Successfully uninstalled opentelemetry-api-1.37.0
  Attempting uninstall: opentelemetry-semantic-conventions
    Found existing installation: opentelemetry-semantic-conventions 0.58b0
    Uninstalling opentelemetry-semantic-conventions-0.58b0:
      Successfully uninstalled opentelemetry-semantic-conventions-0.58b0
  Attempting uninstall: openai
    Found existing installation: openai 2.12.0
    Uninstalling openai-2.12.0:
      Successfully uninstalled openai-2.12.0
  Attempting uninstall: opentelemetry-sdk
    Found existing installation: opentelemetry-sdk 1.37.0
    Uninstalling opentelemetry-sdk-1.37.0:
      Successfully uninstalled opentelemetry-sdk-1.37.0
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.4.59
    Uninstalling langsmith-0.4.59:
      Successfully uninstalled langsmith-0.4.59
  Attempting uninstall: langchain-core
    Found existing installation: langchain-core 1.2.1
    Uninstalling langchain-core-1.2.1:
      Successfully uninstalled langchain-core-1.2.1
  Attempting uninstall: langchain
    Found existing installation: langchain 1.2.0
    Uninstalling langchain-1.2.0:
      Successfully uninstalled langchain-1.2.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-exporter-gcp-logging 1.11.0a0 requires opentelemetry-sdk<1.39.0,>=1.35.0, but you have opentelemetry-sdk 1.39.1 which is incompatible.
langgraph-prebuilt 1.0.5 requires langchain-core>=1.0.0, but you have langchain-core 0.3.65 which is incompatible.
google-adk 1.21.0 requires opentelemetry-api<=1.37.0,>=1.37.0, but you have opentelemetry-api 1.39.1 which is incompatible.
google-adk 1.21.0 requires opentelemetry-sdk<=1.37.0,>=1.37.0, but you have opentelemetry-sdk 1.39.1 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-exporter-otlp-proto-common==1.37.0, but you have opentelemetry-exporter-otlp-proto-common 1.39.1 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-proto==1.37.0, but you have opentelemetry-proto 1.39.1 which is incompatible.
opentelemetry-exporter-otlp-proto-http 1.37.0 requires opentelemetry-sdk~=1.37.0, but you have opentelemetry-sdk 1.39.1 which is incompatible.
Successfully installed asgiref-3.11.0 backoff-2.2.1 bcrypt-5.0.0 build-1.3.0 chroma-hnswlib-0.7.6 chromadb-0.6.3 coloredlogs-15.0.1 dataclasses-json-0.6.7 durationpy-0.10 httptools-0.7.1 humanfriendly-10.0 kubernetes-34.1.0 langchain-0.3.25 langchain-community-0.3.20 langchain-core-0.3.65 langchain-openai-0.3.24 langchain-text-splitters-0.3.8 langsmith-0.3.45 marshmallow-3.26.1 mypy-extensions-1.1.0 onnxruntime-1.23.2 openai-1.109.1 opentelemetry-api-1.39.1 opentelemetry-exporter-otlp-proto-common-1.39.1 opentelemetry-exporter-otlp-proto-grpc-1.39.1 opentelemetry-instrumentation-0.60b1 opentelemetry-instrumentation-asgi-0.60b1 opentelemetry-instrumentation-fastapi-0.60b1 opentelemetry-proto-1.39.1 opentelemetry-sdk-1.39.1 opentelemetry-semantic-conventions-0.60b1 opentelemetry-util-http-0.60b1 packaging-24.2 posthog-7.4.0 pypdf-5.4.0 pypika-0.48.9 pyproject_hooks-1.2.0 typing-inspect-0.9.0 urllib3-2.3.0 uvloop-0.22.1 watchfiles-1.1.1 wrapt-1.17.3 zstandard-0.23.0
In [1]:
import yfinance as yf              # Used for gathering stock prices
import matplotlib.pyplot as plt    # Used for Data Visualization / Plots / Graphs
import pandas as pd                # Helpful for working with tabular data like DataFrames
import os                          # Interacting with the operating system

from langchain.text_splitter import RecursiveCharacterTextSplitter      #  Helpful in splitting the PDF into smaller chunks
from langchain_community.document_loaders import PyPDFDirectoryLoader, PyPDFLoader     # Loading a PDF
from langchain_community.vectorstores import Chroma    # Vector DataBase

1. Organization Selection

Selecting the below five organizations as the analysis pool.

In [2]:
companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"]

2. Setting up LLM - 1 Marks

  • The config.json file should contain API_KEY and API BASE URL provided by OpenAI.
  • You need to insert your actual API keys and endpoint URL obtained from your Olympus account. Refer to the OpenAI Access Token documentation for more information on how to generate and manage your API keys.
  • This code reads the config.json file and extracts the API details.
    • The API_KEY is a unique secret key that authorizes your requests to OpenAI's API.
    • The OPENAI_API_BASE is the API BASE URL where the model will process your requests.

What To Do?

  • Use the sample config.json file provided.
  • Add their OpenAI API Key and Base URL to the file.
  • The config.json should look like this:

    {
          "API_KEY": "your_openai_api_key_here",
          "OPENAI_API_BASE": "https://your_openai_api_base/v1"
        }
In [3]:
#Loading the `config.json` file
import json
import os

# Load the JSON file and extract values
file_name = "config.json"
with open(file_name, 'r') as file:
    config = json.load(file)
    os.environ['OPENAI_API_KEY'] = config["API_KEY"] # Loading the API Key
    os.environ["OPENAI_BASE_URL"] = config["OPENAI_API_BASE"] # Loading the API Base Url
In [4]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",                      # "gpt-4o-mini" to be used as an LLM
    temperature=0,           # Set the temprature to 0
    max_tokens=5000,                 # Set the max_tokens = 5000, so that the long response will not be clipped off
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

3. Visualization and Insight Extraction - 5 Marks

generated-image (2) (1).png

Gather stock data for the selected organization from the past three years using the YFinance library, and visualize this data for enhanced analysis.

**Your Task**

  1. Loop through each company to retrieve stock data of the last three years using the YFinance library.
  2. Plot the closing prices for each company.
In [5]:
plt.figure(figsize=(14,7))

# Loop through each company and plot closing prices
for symbol in companies:
    ticker = yf.Ticker(symbol)
    data = ticker.history(period="3y")

    # Plot closing price
    plt.plot(data.index, data['Close'], label=symbol)

plt.title("Stock Price Trends (Last 3 Years)")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.legend()
plt.grid(True)
plt.savefig("Stock_Price_Trends_3Y.png")
plt.show()

Financial Metrics

  1. Market Cap: Total market value of a company’s outstanding shares.
  2. P/E Ratio: Shows how much investors are willing to pay per dollar of earnings.
  3. Dividend Yield: Annual dividend income as a percentage of the stock price.
  4. Beta: Measures a stock’s volatility relative to the overall market.
  5. Total Revenue: The total income a company generates from its business operations.

**Your Task**

  1. Loop through all the companies to fetch data based on the specified financial metrics.
  2. Create a DataFrame (DF) from the collected data.
  3. Visualize and compare each financial metric across all companies.
  4. For example, visualize and compare the market capitalization for each company.

Tip: Check ticker.info for the available financial metrics

In [6]:
import pandas as pd
import matplotlib.pyplot as plt

companies = ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN", "META"]
metrics_list = {}

# Fetching the financial metrics
for symbol in companies:                          # Loop through all the companies
    ticker = yf.Ticker(symbol)
    info = ticker.info
    metrics_list[symbol] = {                              # Define the dictionary of all the Finanical Metrics
        "Market Cap": info.get("marketCap", 0),
        "P/E Ratio": info.get("trailingPE", 0),
        "P/E Growth Ratio": info.get("trailingPegRatio", 0),
        "Total Revenue": info.get("totalRevenue", 0),
        "Return on Equity (ROE)": info.get("returnOnEquity", 0),
        "Free Cash Flow": info.get("freeCashflow", 0),
        "Price-to-Book (P/B) Ratio": info.get("priceToBook", 0),
        "Debt-to-Equity Ratio": info.get("debtToEquity", 0),
        "Dividend Yield": info.get("dividendYield", 0),
        "Beta": info.get("beta", 0)
    }
In [7]:
# Convert to DataFrame
df = pd.DataFrame(metrics_list).T

# Converting large numbers to billions for readability by divinding the whole column by 1e9
df["Market Cap"] = df["Market Cap"] / 1e9
df["Total Revenue"] = df["Total Revenue"] / 1e9
df["Free Cash Flow"] = df["Free Cash Flow"] / 1e9
df["Dividend Yield"] = df["Dividend Yield"] * 100  # Convert to percentage

df   # Printing the df
Out[7]:
Market Cap P/E Ratio P/E Growth Ratio Total Revenue Return on Equity (ROE) Free Cash Flow Price-to-Book (P/B) Ratio Debt-to-Equity Ratio Dividend Yield Beta
GOOGL 3720.359444 30.291912 1.6296 385.476002 0.35450 47.997751 9.588861 11.424 27.0 1.070
MSFT 3611.924365 34.609688 1.9865 293.812011 0.32241 53.327376 9.949223 33.154 75.0 1.070
IBM 281.336611 35.830956 2.0596 65.401999 0.30156 11.757500 10.082069 237.831 223.0 0.689
NVDA 4406.563570 44.799507 0.6921 187.141997 1.07359 53.282873 36.997140 9.102 2.0 2.284
AMZN 2430.420648 32.157000 1.5384 691.330023 0.24327 26.080000 6.573279 43.405 0.0 1.372
META 1660.448014 29.149115 1.5557 189.458006 0.32643 18.617750 8.557677 26.311 32.0 1.273
In [8]:
import matplotlib.pyplot as plt
import math

metrics_to_plot = df.columns.tolist()

colors = [
    "tab:blue", "tab:orange", "tab:green", "tab:red", "tab:purple",
    "tab:brown", "tab:pink", "tab:gray", "tab:olive", "tab:cyan"
]

n_cols = 3
n_rows = math.ceil(len(metrics_to_plot) / n_cols)

fig, axes = plt.subplots(
    n_rows,
    n_cols,
    figsize=(18, 4 * n_rows)   # 👈 key change
)

axes = axes.flatten()

for i, metric in enumerate(metrics_to_plot):
    ax = axes[i]
    ax.bar(df.index, df[metric], color=colors[i % len(colors)])
    ax.set_title(f"{metric} Comparison")
    ax.set_ylabel(metric)
    ax.set_xlabel("Company")
    ax.grid(axis='y')

# Remove unused subplots
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout(pad=2.0)
plt.show()

4. RAG-Driven Analysis - 7 Marks

generated-image (1) (1).png

Performing the RAG-Driven Analysis on the AI Initiatives of the companies

**Your Task**

  1. Extract all PDF files from the provided ZIP file.
  2. Read the content from each PDF file.
  3. Split the content into manageable chunks.
  4. Store the chunks in a vector database using embedding functions.
  5. Implement a query mechanism on the vector database to retrieve results based on user queries regarding AI initiatives.
  6. Evaluate the LLM generated response using LLM-as-Judge

A. Loading Company AI Initiative Documents (PDFs) - 1 mark

In [9]:
# Unzipping the AI Initiatives Documents
import zipfile
with zipfile.ZipFile("/content/pdf_data/Companies-AI-Initiatives.zip", 'r') as zip_ref:
  zip_ref.extractall("/content/pdf_data")         # Storing all the unzipped contents in this location
In [10]:
# Path of all AI Initiative Documents
ai_initiative_pdf_paths = [f"/content/pdf_data/Companies-AI-Initiatives/{file}" for file in os.listdir("/content/pdf_data/Companies-AI-Initiatives")]
ai_initiative_pdf_paths
Out[10]:
['/content/pdf_data/Companies-AI-Initiatives/MSFT.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/IBM.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/NVDA.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/GOOGL.pdf',
 '/content/pdf_data/Companies-AI-Initiatives/AMZN.pdf']
In [11]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader(path = "/content/pdf_data/Companies-AI-Initiatives/")          # Creating an PDF loader object
In [12]:
# Defining the text splitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=1000,
    chunk_overlap=200
)
In [13]:
# Splitting the chunks using the text splitter
ai_initiative_chunks = loader.load_and_split(text_splitter)
In [14]:
# Total length of all the chunks
len(ai_initiative_chunks)
Out[14]:
62

B. Vectorizing AI Initiative Documents with ChromaDB - 1 mark

In [15]:
# Defining the 'text-embedding-ada-002' as the embedding model
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
In [16]:
#  Creating a Vectorstore, storing all the above created chunks using an embedding model
vectorstore = Chroma.from_documents(
    ai_initiative_chunks,
    embedding_model,
    collection_name="AI_Initiatives"
)

# Ignore if it gives an error or warning
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given

You can safely ignore this error. It is a known, harmless telemetry issue in Chroma and does NOT affect your vector store, embeddings, or retrieval.

Chroma tries to send anonymous usage telemetry

There’s a version mismatch between Chroma and its telemetry dependency (posthog)

The telemetry call fails

Your vectorstore is still created successfully

✅ Your data is stored ✅ Embeddings work ✅ Similarity search works

In [17]:
# Creating an retriever object which can fetch ten similar results from the vectorstore
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 10}
)

C. Retrieving relevant Documents - 3 marks

In [18]:
user_message = "Give me the best project that `IBM` company is working upon"
In [19]:
# Building the context for the query using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(user_message)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
/tmp/ipython-input-1632725213.py:2: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 1.0. Use :meth:`~invoke` instead.
  relevant_document_chunks = retriever.get_relevant_documents(user_message)
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
In [20]:
len(relevant_document_chunks)
Out[20]:
10
In [21]:
# Write a system message for an LLM to help craft a response from the provided context
qna_system_message = """
You are an assistant whose work is to review the articles and provide the appropriate answers from the context.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer only using the context provided in the input. Do not mention anything about the context in your final answer.

If the answer is not found in the context, respond "I don't know".
"""
In [22]:
# Write an user message template which can be used to attach the context and the questions
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""
In [23]:
# Format the prompt
formatted_prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
                [/INST]"""
In [24]:
# Make the LLM call
resp = llm.invoke(formatted_prompt)
resp.content
Out[24]:
'IBM is currently focusing on the development of the Granite family of AI models, which are open-source, high-performance AI foundation models designed to empower enterprise applications across various industries. These models aim to provide efficient, customizable, and scalable AI capabilities that can be integrated into business workflows while maintaining control over data and ensuring responsible use of AI technologies.'
In [25]:
# Define RAG function
def RAG(user_message):
    """
    Args:
    user_message: Takes a user input for which the response should be retrieved from the vectorDB.
    Returns:
    relevant context as per user query.
    """
    relevant_document_chunks = retriever.get_relevant_documents(user_message)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)



    # Combine qna_system_message and qna_user_message_template to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_message)}
                [/INST]"""

    # Quering the LLM
    try:
        response = llm.invoke(prompt)

    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response.content
In [26]:
# Test Cases
print(RAG("How is the area in which GOOGL is working different from the area in which MSFT is working?"))
Google (GOOGL) focuses on a broad range of AI applications, including natural language processing, computer vision, speech recognition, and generative AI through initiatives like Gemini and Project Astra. Their efforts are integrated into consumer products such as Google Search, Gmail, and Google Assistant while also enhancing enterprise solutions via Google Cloud’s Vertex AI.

In contrast, Microsoft (MSFT) emphasizes software productivity tools with its AI capabilities embedded in Microsoft 365 applications like Word and Excel through initiatives such as Copilot. Microsoft's approach includes partnerships with OpenAI and the development of Azure AI to enhance enterprise solutions.

Overall, GOOGL is more focused on integrating advanced AI across various consumer-facing products while MSFT centers its efforts around improving productivity within business applications.
In [27]:
print(RAG("What are the three projects on which MSFT is working upon?"))
Microsoft is working on the following three projects:

1. Azure AI Foundry Labs: An experimental AI platform aimed at accelerating the translation of advanced AI research into real-world applications, providing a collaborative hub for developers and enterprises.

2. Microsoft 365 Copilot: An AI-powered productivity assistant embedded across Microsoft 365 applications to enhance productivity by automating tasks and providing intelligent assistance.

3. GitHub Copilot & IntelliCode: GitHub Copilot is an AI-powered coding assistant that suggests code completions and supports conversational assistance, while IntelliCode offers context-aware suggestions within Visual Studio to improve developer productivity.
In [28]:
print(RAG("What is the timeline of each project in NVDA?"))
For Project G-Assist:
- **Concept & Demo Phase**: Early prototypes were teased in NVIDIA showcases tied to RTX AI initiatives.
- **Public Availability**: G-Assist became accessible via the NVIDIA App in 2024–2025, marking the first time consumers could interact with the assistant at scale.
- **Iterative Updates**: Throughout 2024 and 2025, NVIDIA improved memory efficiency, broadened GPU compatibility, and launched plugin SDKs.

For DLSS 4:
The timeline specifics for DLSS 4 are not explicitly mentioned in the provided context. However, it is noted that as of 2025, DLSS 4 is fully available and integrated into many new AAA titles. 

For Project Olympus:
The exact start date of Olympus is not publicly disclosed but reports suggest development has been ongoing since at least 2023. It is expected to be unveiled at the upcoming AWS re:Invent conference typically held in late November or early December.

Overall timelines for each project indicate active development phases with specific milestones related to public availability and updates.
In [29]:
print(RAG("What are the areas in which AMZN is investing when it comes to AI?"))
Amazon is investing in several areas related to AI, including:

1. **Retail**: Enhancing product recommendations, dynamic pricing, fraud detection, and supply chain optimization.
2. **Amazon Web Services (AWS)**: Offering AI and machine learning tools for building intelligent applications.
3. **Voice Assistants**: Innovations like Alexa that understand speech and perform tasks.
4. **Robotics**: Streamlining order fulfillment in warehouses through robotics technology.
5. **Generative AI Applications**: Developing platforms like Amazon Bedrock for building generative AI applications and the Olympus project for multimodal AI capabilities.

These investments aim to make services smarter, more efficient, and more convenient for customers and businesses alike across various industries such as finance, healthcare, retail, and media.
In [30]:
print(RAG("What are the risks associated with projects within GOOG?"))
The risks associated with projects within Google (GOOG) include:

1. **Privacy Concerns**: Processing live video and audio data raises significant privacy issues, necessitating robust data protection measures.
2. **Technical Hurdles**: Achieving real-time, accurate multimodal understanding requires overcoming complex AI and hardware challenges.
3. **User Acceptance**: Gaining user trust and acceptance for a new form of AI assistant that interacts in more personal and potentially intrusive ways.
4. **Regulatory Compliance**: Navigating the evolving landscape of AI regulations and ensuring compliance with global standards.
5. **Model Safety**: Hallucinations and factual inaccuracies remain a risk, requiring constant evaluation and moderation.
6. **Regulatory Scrutiny**: Integrations could attract antitrust scrutiny or other regulatory challenges.
7. **Compute Costs**: High-performing models require significant energy and infrastructure, increasing operational costs.
8. **Competition**: Maintaining differentiation amid competitors like OpenAI, Meta, Anthropic, etc.

These risks highlight the complexities involved in developing advanced technologies while ensuring safety, compliance, user trust, and competitive positioning in the market.

D. Evaluation of the RAG - 2 marks

In [31]:
# Writing a question for performing evaluations on the RAG
evaluation_test_question = "What are the three projects on which MSFT is working upon?"
In [32]:
# Building the context for the evaluation test question using the retrieved chunks
relevant_document_chunks = retriever.get_relevant_documents(evaluation_test_question)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)
In [33]:
# Default RAG Answer
answer = RAG(evaluation_test_question)
print(answer)
Microsoft is working on the following three projects:

1. Azure AI Foundry Labs: An experimental AI platform aimed at accelerating the translation of advanced AI research into real-world applications, providing a collaborative hub for developers and enterprises.

2. Microsoft 365 Copilot: An AI-powered productivity assistant embedded across Microsoft 365 applications to enhance productivity by automating tasks and providing intelligent assistance.

3. GitHub Copilot & IntelliCode: GitHub Copilot is an AI-powered coding assistant that suggests code completions and reviews pull requests, while IntelliCode offers context-aware suggestions within Visual Studio to improve developer productivity.
In [35]:
# Defining user messsage template for evaluation
evaluation_user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""
1. Groundedness
In [36]:
# Writing the system message and the evaluation metrics for checking the groundedness
groundedness_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""
In [37]:
# Combining groundedness_rater_system_message + llm_prompt + answer for evaluation
groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
            {'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
            [/INST]"""
In [38]:
# Defining a new LLM object
groundness_checker = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=500,
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

# Using the LLM-as-Judge for evaluating Groundedness
groundness_response = groundness_checker.invoke(groundedness_prompt)
print(groundness_response.content)
### Steps to Evaluate the Answer

1. **Identify Key Information in the Context**: Extract the main projects mentioned in the context that Microsoft is working on.
2. **Compare with AI Generated Answer**: Check if all three projects listed in the answer are present and accurately described based on information from the context.
3. **Assess Completeness and Accuracy**: Determine if any additional information not found in the context has been included or if any critical details have been omitted.
4. **Rate Adherence to Metric**: Based on how well the answer aligns with only using information from the provided context, assign a score according to evaluation criteria.

### Step-by-Step Explanation of Adherence

1. The question asks for "the three projects" Microsoft is working on, which directly correlates with specific initiatives outlined in detail within the provided context.
2. The AI-generated answer lists:
   - Azure AI Foundry Labs
   - Microsoft 365 Copilot
   - GitHub Copilot & IntelliCode
   
3. Each project is described succinctly but accurately reflects what was presented in detail within each section of context:
   - Azure AI Foundry Labs is correctly identified as an experimental platform aimed at translating advanced research into applications.
   - Microsoft 365 Copilot's role as a productivity assistant embedded across applications aligns perfectly with its description regarding task automation and intelligent assistance.
   - GitHub Copilot & IntelliCode are both mentioned together, highlighting their functions related to coding assistance and productivity improvements for developers.

4. There are no inaccuracies or extraneous details added that deviate from what was presented; thus, it adheres strictly to using only contextual information.

### Evaluation of Metric Adherence

The answer follows all aspects of deriving its content solely from what was presented in the given context without introducing outside knowledge or assumptions about Microsoft's projects.

### Score Assignment

Given that every aspect of adherence has been met completely:

- I would rate this response as a **5**, indicating that "the metric is followed completely."
2. Relevance
In [39]:
# Writing the system message and the evaluation metrics for checking the relevance
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""
In [40]:
# Combining relevance_rater_system_message + llm_prompt + answer for evaluation
relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
            {'user'}: {evaluation_user_message_template.format(context=context_for_query, question=evaluation_test_question, answer=answer)}
            [/INST]"""
In [41]:
# Defining a new LLM object
relevance_checker = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=500,
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

# Using the LLM-as-Judge for evaluating Relevance
relevance_response = relevance_checker.invoke(relevance_prompt)
print(relevance_response.content)
### Steps to Evaluate the Context as per the Metric

1. **Identify Key Aspects of the Question**: Determine what specific information is being asked in the question.
2. **Analyze Context for Relevant Information**: Review the context provided to see if it contains information that directly addresses those key aspects.
3. **Check Completeness and Exclusivity**: Ensure that all important aspects of the question are covered in the answer, and verify that no irrelevant details are included.
4. **Assess Clarity and Conciseness**: Evaluate whether the answer is clear, concise, and easy to understand while still being comprehensive.

### Step-by-Step Explanation of Adherence to Metric

1. **Key Aspects of Question**:
   - The user asks for "the three projects on which MSFT is working upon." This indicates a need for a list or description of three specific projects.

2. **Relevant Information from Context**:
   - The context provides detailed descriptions about several initiatives by Microsoft including Azure AI Foundry Labs, Microsoft 365 Copilot, GitHub Copilot & IntelliCode.
   - Each project mentioned includes its purpose and functionality which aligns with what was requested in the question.

3. **Completeness and Exclusivity Check**:
   - The answer lists exactly three projects (Azure AI Foundry Labs, Microsoft 365 Copilot, GitHub Copilot & IntelliCode) as requested by the user.
   - It does not include any extraneous information beyond these projects; thus it remains focused on answering only what was asked.

4. **Clarity and Conciseness Assessment**:
   - The language used in each project description is straightforward without unnecessary jargon or complexity making it accessible for understanding.

### Evaluation Extent
The AI-generated answer effectively identifies all three relevant projects from Microsoft's current initiatives based on provided context while maintaining clarity without introducing irrelevant details or excessive elaboration beyond what's necessary to understand each project's purpose.

### Rating
Based on this analysis:

- All important aspects are contained within a clear structure addressing exactly what was asked (three specific projects).
- There are no irrelevant details included; hence relevance is maintained throughout.

Thus I would rate this response as follows:

Score: 5 (The metric is followed completely)

5. Scoring and Ranking - 3 Marks

image.png

Prompting an LLM to score each company by integrating Quantitative data (stock trend, growth metrics) and Qualitative evidence (PDF insights)

**Your Task**

  1. Write a system message and a user message that outlines the required data for the prompt.
  2. Prompt the LLM to rank and recommend companies for investment based on the provided PDF and stock data to achieve better returns.
In [43]:
# Fetching all the links of the documents
len(vectorstore.get()['documents'])
Out[43]:
62
In [50]:
# Write a system message for instructing the LLM for scoring and ranking the companies
system_message = """
You are a financial analyst assistant. Your task is to evaluate and rank a list of companies for investment potential.

You will be provided with two types of data:
1. Quantitative financial growth metrics such as market capitalization, P/E ratio, P/E Growth Ratio, Total Revenue, Return on Equity (ROE), Free Cash Flow, Price-to-Book (P/B) Ratio, Debt-to-Equity Ratio, Dividend Yield, and beta.
2. Qualitative strategic insights extracted from organizational reports (AI initiatives and other strategic information).

Your goal is to analyze both the quantitative data and qualitative insights to score each company on overall growth potential, innovation, risk, and strategic positioning.

You need to rank the companies from most to least recommended for investment, providing a brief explanation of your reasoning behind the top 3 picks.

Be clear, concise, and justify your rankings based on both data types.
"""
In [46]:
# Write a user message for instructing the LLM for scoring and ranking the companies
user_message = f"""
You are given:

---
### 1. Financial Data (Quantitative)
{df.to_string()}

---
### 2. Strategic Insights (Qualitative)
{vectorstore.get()['documents']}

---

Please score and rank these companies from best to worst investment opportunities. Consider financial growth metrics *and* the qualitative strategic insights.

Provide:
- A ranked list of companies
- Scores or ratings for each company
- Key reasons supporting the rankings, emphasizing strengths and risks

Your evaluation should help an investor decide where to allocate capital for better returns.
The recommendation.content should be clear and concise so that it is nicely formatted in google colab.
"""
In [51]:
# Formatting the prompt
formatted_prompt = f"""[INST]{system_message}\n
                {'user'}: {user_message}
                [/INST]"""
In [52]:
# Calling the LLM
recommendation = llm.invoke(formatted_prompt)
recommendation.content
Out[52]:
"### Ranked List of Companies\n\n1. **Microsoft (MSFT)**\n   - **Score:** 9.5/10\n2. **NVIDIA (NVDA)**\n   - **Score:** 9/10\n3. **Google (GOOGL)**\n   - **Score:** 8.5/10\n4. **Amazon (AMZN)**\n   - **Score:** 8/10\n5. **IBM (IBM)**\n   - **Score:** 7/10\n6. **Meta Platforms, Inc. (META)**\n   - **Score:** 6/10\n\n---\n\n### Key Reasons Supporting Rankings\n\n#### Microsoft (MSFT)\n- *Strengths:*\n    - Strong financial metrics with a solid market cap and free cash flow.\n    - Aggressive AI initiatives like Azure AI Foundry Labs and Microsoft Copilot that enhance productivity across its software ecosystem.\n    - High return on equity indicates effective management and profitability.\n- *Risks:*\n    - High P/E ratio suggests potential overvaluation; however, growth prospects in AI mitigate this concern.\n\n#### NVIDIA (NVDA)\n- *Strengths:*\n    - Leading position in the GPU market essential for AI applications, driving significant revenue growth.\n    - Innovative projects like DLSS and G-Assist showcase commitment to integrating advanced technologies into consumer products.\n    - Strong financial performance with high revenue growth rates and robust cash flow generation.\n- *Risks:*\n    - High beta indicates volatility; reliance on gaming markets can be risky if demand fluctuates.\n\n#### Google (GOOGL)\n- *Strengths:*\n    - Significant investment in multimodal models through Gemini enhances its competitive edge in search and enterprise solutions.\n    - Established reputation for innovation in AI research through DeepMind, which supports long-term strategic positioning.\n- *Risks:*\n     – Regulatory scrutiny could impact operations; competition from other tech giants is intense.\n\n#### Amazon (AMZN)\n- *Strengths:*\n     – Diverse portfolio leveraging AI across retail, AWS services like SageMaker, Bedrock shows strong integration capabilities for generative applications.\n     – Continuous investment into innovative platforms positions it well within the cloud computing space as demand grows for machine learning solutions.\n- *Risks:* \n     – Lower ROE compared to competitors may indicate less efficient capital use; heavy competition from other cloud providers could pressure margins.\n\n#### IBM (IBM)\n- *Strengths:* \n     – Focused on enterprise-grade solutions with initiatives like Granite models that cater to specific industry needs while promoting open-source accessibility which can drive adoption among businesses looking for customizable options without vendor lock-in risks..\n     – Long-standing presence in the technology sector provides stability amidst evolving markets..\n     \n*Risks:* \n      – Slower growth relative to peers due to legacy business challenges; high debt-to-equity ratio raises concerns about financial leverage..\n\n#### Meta Platforms Inc.(META)\n*Strengths:* \n      — Investments into VR & AR technologies alongside ongoing developments around social media platforms provide avenues for future monetization opportunities.. \n\n*Risks:*  \n       — Struggles with user engagement metrics post-pandemic raise questions about long-term viability of advertising revenues.. \n\n---\n\nThis evaluation highlights Microsoft's leadership position driven by innovation coupled with strong fundamentals as the top choice followed closely by NVIDIA's technological advancements within a booming sector . Google remains competitive but faces regulatory pressures while Amazon's diverse offerings keep it relevant despite some operational inefficiencies . IBM lags behind due primarily legacy issues whereas Meta struggles under scrutiny regarding user engagement trends impacting ad revenues negatively overall making them less attractive investments at present time ."
In [54]:
# print(recommendation.content)
from IPython.display import display, Markdown

display(Markdown(recommendation.content))

Ranked List of Companies

  1. Microsoft (MSFT)
    • Score: 9.5/10
  2. NVIDIA (NVDA)
    • Score: 9/10
  3. Google (GOOGL)
    • Score: 8.5/10
  4. Amazon (AMZN)
    • Score: 8/10
  5. IBM (IBM)
    • Score: 7/10
  6. Meta Platforms, Inc. (META)
    • Score: 6/10

Key Reasons Supporting Rankings

Microsoft (MSFT)

  • Strengths:
    • Strong financial metrics with a solid market cap and free cash flow.
    • Aggressive AI initiatives like Azure AI Foundry Labs and Microsoft Copilot that enhance productivity across its software ecosystem.
    • High return on equity indicates effective management and profitability.
  • Risks:
    • High P/E ratio suggests potential overvaluation; however, growth prospects in AI mitigate this concern.

NVIDIA (NVDA)

  • Strengths:
    • Leading position in the GPU market essential for AI applications, driving significant revenue growth.
    • Innovative projects like DLSS and G-Assist showcase commitment to integrating advanced technologies into consumer products.
    • Strong financial performance with high revenue growth rates and robust cash flow generation.
  • Risks:
    • High beta indicates volatility; reliance on gaming markets can be risky if demand fluctuates.

Google (GOOGL)

  • Strengths:
    • Significant investment in multimodal models through Gemini enhances its competitive edge in search and enterprise solutions.
    • Established reputation for innovation in AI research through DeepMind, which supports long-term strategic positioning.
  • Risks: – Regulatory scrutiny could impact operations; competition from other tech giants is intense.

Amazon (AMZN)

  • Strengths: – Diverse portfolio leveraging AI across retail, AWS services like SageMaker, Bedrock shows strong integration capabilities for generative applications. – Continuous investment into innovative platforms positions it well within the cloud computing space as demand grows for machine learning solutions.
  • Risks: – Lower ROE compared to competitors may indicate less efficient capital use; heavy competition from other cloud providers could pressure margins.

IBM (IBM)

  • Strengths: – Focused on enterprise-grade solutions with initiatives like Granite models that cater to specific industry needs while promoting open-source accessibility which can drive adoption among businesses looking for customizable options without vendor lock-in risks.. – Long-standing presence in the technology sector provides stability amidst evolving markets..

Risks: – Slower growth relative to peers due to legacy business challenges; high debt-to-equity ratio raises concerns about financial leverage..

Meta Platforms Inc.(META)

Strengths: — Investments into VR & AR technologies alongside ongoing developments around social media platforms provide avenues for future monetization opportunities..

Risks:
— Struggles with user engagement metrics post-pandemic raise questions about long-term viability of advertising revenues..


This evaluation highlights Microsoft's leadership position driven by innovation coupled with strong fundamentals as the top choice followed closely by NVIDIA's technological advancements within a booming sector . Google remains competitive but faces regulatory pressures while Amazon's diverse offerings keep it relevant despite some operational inefficiencies . IBM lags behind due primarily legacy issues whereas Meta struggles under scrutiny regarding user engagement trends impacting ad revenues negatively overall making them less attractive investments at present time .

6. Summary and Recommendation - 4 Marks

Based on the project, learners are expected to share their observations, key learnings, and insights related to the business use case, including any challenges they encountered. Additionally, they should recommend improvements to the project and suggest further steps for enhancement.

A. Summary / Your Observations about this Project - 2 Marks

  1. The project effectively combines financial growth metrics with strategic insights from organizational reports using a RAG-based DualLens approach.

  2. Investment rankings align with market leaders (Microsoft, NVIDIA, Google), indicating accurate retrieval and meaningful synthesis by the LLM.

  3. The approach improves explainability, as recommendations are supported by both quantitative data and qualitative strategy analysis.

B. Recommendations for this Project / What improvements can be made to this Project - 2 Marks

  1. Introduce a structured, weighted scoring framework to improve consistency and reduce subjectivity in LLM-generated scores.

  2. Add time-aware retrieval and risk/confidence indicators to avoid outdated strategic insights and improve reliability.

  3. Validate the system through backtesting and portfolio-level analysis to measure real-world investment performance.

  4. Can be improved by adding more data points from yahoo library for functional analysis.

  5. Can use two seperate recommendations for short and long term investments.

In [ ]: